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The two purposes of this investigation were to study 
the effects of variance heterogeneity on three selected multiple 
ccmparison proceaures and to determine 1£ either of two nonstandard 
methods would be superior to the conventional methods based on mean 
square within. The three procedures studied were the Wholly 
significant Difference Test (WSD) , The "S" test, and a simple 
multiple "t" test (MTT) procedure. The investigation was a computer 
simulation consisting of 1000 experiments with four independent 
samples of five data poims* Six pairwise contai^sts were considered. 
The four variance conditions (VC) constituted one factor of the 
design* Each of the six contrasts were tested using three methods. 
The three methods constituted a second factor in the two-factor 
design with VC crossed with method. Results are tabulated and 
discussed, (DB) 
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Educational and psychological researcheis often deal with groups 
that tend to be heterogeneouB In variability. ' The purpose of this In- 
vestl^tion was to study the effects of variance heteroseneity on three 
selected jnultiple comparison procedurts, and to dBtermine if either of 
two non-standard methods would be superior to the conventional methods 
"based on mean square within. 

The three prooeduras studied were thu Wholly Significant Difference 
Test (WSD) developed by Tukey (1953), the S test presented by Scheffe 
(1953), and a simple multiple t test (MTT) procedure. These three were 
selected because each controls a distinctly different Type I error mte. 
The WSD controls the famllywlse lyps I- error rate (FWl), the MTT con- 
trols the contrast signiflcanoe mte, and the S test controla the risk 
of finding at least one significant contrast in a set of all possible 
contmsts. Games (1971a, 1971b) showed that all three procedures use 
the same statistic and henoe any differences between the three procedures 
are due to the use of dlffarent critical values (CV), 
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Wethod 

This investl^tlon was a computer simulation consisting of 1000 
experiments with four Independent samples (k = k) of five data points 
(n - 5), Six ^Irwlse contrasts were considered. Each data point, 
selected at random frorn a population of lOpOOO nonnal deviates, was 
scaled ty multiplying it by a specifically chosen constant to create 
variance ratios of kikikii^ (VC-1), 1:3:5:7 (VC-2), lili?!? (vC-3), and 
l:l:lil3 (TC-^), The four variance conditions (VC) constituted one 
factor of the design. 

Bach of the six contmsts were tested using three methods, Method 
MSW consisted of the conventional test using the gquare root of two 
Mean Square Within divided by n as the denominator of a t with df "16, 
Method t used the standard error of the common t test with df =8* 
Method BF used the Behrens-Fisher t' statistic defined as 
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fiXfj with df given by the Welch solution 

"i ' 



(Winer, 1962, p*37) recommended by Scheffe (1970) and Wang (I971). The 
three methods constitute a second factor in the two-factor design with 
VC crossed with method. 

Conditions when the null hypothesis is false were created by adding 
a constant to each data point* A uniform dietributlon was used spread- 
ing the population means equally a^rt. The distance between the means 
was calculated by a formula modified from Games and Lucas (1966, p,317). 

Five dependent measures were recorded simultaneouslys the aveimge 
contrast Type I rate (P), the familywise Type I imte- (PWl), specific 
contrasts % - %, - X2 , and X3 - 



3. 

A two factor wlthln^repllcations analysis of variance (AOV) was 
porfDCTied f or ©ach of the five dependent variables, for the null 
hypothesis condition, both VC and SB biing repeated mMsure factors. 
The anal^^ls was performed on the IBM 36O/67 using the library routine 
ANOVR (I . .a, 1968), 

Results 

Per opmparison imtes are presented in Table I, one for each method 
With the exception of the *084 value for the ^EW under all of 

the deviations of P from ,05 are relatively minor, Furthemore, as 
expected p the use of MSW produces the greatest power under the homo- 
geneous condition (VC=l)i Only under the most eKtreme variance con-- 
dition (VC-^) did the use of MSW produce an inflated Type I error rate 
(and a lower power curve). Therefore p using P as a dependent measure 
suggests "the universal use of MSW in all but extremely heterogeneous 
conditions* The robustness of the common multiple comparison solution 
can be defended on the basis of P datag at least for the equal n case. 
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However, the above analyais in auparficlal since control of p does 
not imply adequate control over the individual imtBS from which the 
aveiagB was obtained. Three specif io contmats were isolated (Xi- 
Xi- X2, and X3- X4) as the most Interesting in terms of the manner in 
which the heterogeneoui conditions were estahllshed. 

As an example, assume a reaiarcher tpsted the difference between 
the means of the third and fourth groups (X3- X^j.) and that between the 
first and second groups Xg) on e priori experimental grounds, using 

the pooled variance estimate MSW. The valuas f or the two contrasts that 
he selected (from Table 1) are presented beloK. In every heterogeneous 
variance condition, P underostlmates the mte for the X3- % oontmst 
and overestimates it for the \- I2 contrast. 

Varlanees f fj- X^^ . Xg 



111: 7s 7 



.048 .052 " .m 

.054 .105 .007 

.06i^ .140 .000 



lil:ia3 I .084 .161 .004 

The P results hide the great inacGuraciea on Individual contrasts. 
P =.064 may be obtained as the average of two eontmsts with P(ai)=.l4, 
tKo with P(EI) =.000, and two with P(El) =.063 (as suggested by the results 
on VC-3). However, if the researcher above tested his means using the 
t test, he would achieve more uniform control. Using the Behrena-Flsher 
method, results in P(El)C- regardlesB of the individual contrast seleoted 
as shown below. 
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An aOV was performed using the three individual contmsts as a 
third within-replications factor (CON) with VO and method, As expectad, 
the analysis resulted in a significant CON x VC x method intemction 
(Wm 72,9, df ^12,38; p< ,001), The COW x VC Intemction was different 
for each method, A two factor repeated measures design with CON and VC 
the two factors was conducted for each method, When using MSW there was 
a wide difference in P(EI) in the three contrasts as tlie variance con- 
dition changed. There was a sl^ificant COM x VC Interaction (F-l48,9; 
df^6,l8| ,001), Using the t method produced considerable improve- 
ment hut still resulted In a significant CON x VC Intsractlon (F=6^,324| 
df»6|l8| p^ ,001), However, for the BF method no significant differences 
were noted for either factor or the interaction. This result suggests 
that using the BF method when violation of the homogeneity of variance 
aasumption is suspected will result in the over-all steMlity of Type I 
error around , 

Figure 1 illustrates the power ourves for two contrasts, X^^^Xg a.nd 
Xj-X^ when the variance ratio 1-. Ii7i7 (VG-3) was used. The theoretical 
normal curve power solutions for these two contrasts have also been in- 
serted. The normal curve solutions are available since the pppulation 
variances are known,. These power curves are higher than similar power 
yurves using the proper t distribution. Some points for the t solutions 
are not available due to llmltotlons in available tables. The power 
curves f 01? the contrasts using MSW greatly diverge from the theoretical 
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curves using the known population variances, The obtained curye on 
X^- X2reinalned cDnseivative throughout when fISW was used, When the BP 
was used the power rose to ,57 and roughly paralleled the theoretical 
curve, U.ider these oondltions It is evident that the power curves using 
MSW are not acceptable. The fact that the two values of F(EI) ave^mge 
to ,07 is not CiVidenca that the MIT or any other technique based on l^W 
is robust to heterogeneous variancas. 

The Familywise Hate 

The famllywLse Type I significance rate (PWl) is the proportion 
of sets of contrasts that contain at least one significant result. 
While P makes little sense when the individual xmtss differ, FWI Is 
meanin^ul regardltss of the equality of Individual rates. The maximum 
value of any individual rate establishes the minimum value of the FWI, 
The maximum value (, 162), found under the Individual rate X]_- ^ for 
MIT using MSW, is the lowest possible value for PWI under this condition. 
In geneml, the more contrasts there are In a set the greater FWI 
becomes. The Tukey WSD was designed to control the FWI for such a 
set, i,e,, th© selection of for the WSD specifies the theoretical 
probability of at least one Type I error for the sei of contrasts when 
all of the assiunptlons are met, " 
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When ISW was used, as in the conventional form of the WSD test 
(fiilleri 1966, Winer, I962), hetsrogeneous variances increased the 
PWI above the .05 level. When the t method was used there was only a 
slight increase in FWI over the four variance conditions. When the BP 
was used the PWI values were conservative with one value falling sig- 
nificantly below the theoretical .05 level. 

Taking the one minus the probability of rejecting at least one 
oontmst when the null hypothesis is false as Familywise Power, the 
I^SW method had higher power tban the other two methods in the hoinogeneoua 
condition. As the variance condition became more heterogeneous, MSl^ 
lost its supiriority In terms of PWI power, but became inferior only 
In the extreme heterogeneous condition. 

The Scheffi S teat was deslgnsd to control FWl on a set of all 
possible contrasts. Just as P is 1ms than ^ for the WSD, so the PWI 
Is less than for the S test. Otherwise, the results of the PWI 
analysis for Scheffe were conslstsnt with those of the WSD. The same 
trends wert found but with lower over all proportions of rejection of 
Ho, 

Discussion 

The results above indicate that when ■mriance heterogsneity exists, 
using a pooled error term as in the MSW method is inappropriate. For 
varioim individual contrasts, P (El) will be inftated while for other 
contmsts P(EI) will approach ^ero. The use of MSW produces major dis* 
tortions In imny of the individiml oontmst power curves. These results 
will not be alleviated by insrely increasing the common sample slge. 



Th© results also indicate that the use of a non-pooied iCTor term, ' 
as in the t or BF rnethods prcvidas Improved control of fCei) for all 
contrasts and an acceptable power curve for all contrasts. 

The decision to use MSW or the other methods Involveri risk either 
way. The use of MSW when Inapprop^-late risks uncontrolled FCSI) and 
mial^dlng power cnicvm. Not using HSW under honiogeneous variance con* 
dltions risks only a slight depression in P(El) and a compamtively 
slight uniform loss in poi/er. If the unive%^al use of -^ne method is 
desired^ then that method should be the Behrens-Flshtr method. 

Testing for variance hetGrcgtnolty or fortenslve experience with the 
variables might provide fevidenco for deciding on a method. However, 
caution should be used In testing variance homogeneity since many , 
tests are sensitive to violations of assumptions regarding the form of 
the population, (Box^ 1953j Games , Wlnklar and Probgrt, In Ft^m), 

. The FWI results shou that the use of I^iSW is not as disadvantageous 
on this overall index as it is with resp^ict to individual com^rlsons 
when nj^ ^nj =n. The WSD using MSW is as robust to heterogeneous variances 
as is the analysis of variance. FWI will be somewhat Inflated with 
heterogeneous varlancGS and equal n, but the inflation is at least limited 
to 2c^ or 3c^ as a ma^dmum. Unf ortunp,tily, this is not a great deal of 
oonsolation, The overall control of FWI is ^ achieved by using a con^ 
servatlv© critical Vu^lue that substantiaDly %'eduees power on each 
contrast. The phohomenon still yxlJt^ that various individual contiaBts 
are being teatedp often with an Inappropriate nrror term, and that the 
risk of Type II erroi:^ will often bo close to 1,0 for many substantial 
Gontraets, 

Portunatelyp the game technique that improves control of P( 11 ) 
on each contrast also improves th^ robuetneos of PWI, The BP method 



applied to each m€%n differoncs yi^lAo gin FWI that is so-newhat coneerva 
tive when the hom0g3neous vo.riance condition exists (.O3B) but Khloh 
never rises above the theoretiGP.l va Lue of even if the assumption 
is not true. This Is in contrast to 'DOth the MSW and t mithods. The 
BP method inay be rGCommended for the control of either P(El) or FWI, 
although the researcher will expoL^lenc^ so;nc decrease in pOK-tr when the 
homogeneQus yariance aondltlon Is true. Historically^ the major dis- 
advantage of using fiithsr tho t cr 1? mathods haa been the Incj^eased 
amount of calculation necsjsary. With the now widespread use of com* 
puters, eithBr mcthcd can be incorporated into general purpose programs 

With equal n'a, as uarxl in this stuiyp the computod t and BF 
statistics will be identical rnd tho.only dlffermce betwaan the two 
^ methods is in the critical values u^td. The c:i;ltlcal value of tn is 
from the t sampling disbribution with df ^2n - K./iCo^'/Si 2n - S) 3 . 
The critical value of varies from thin value (as a lower limit) to 
an upper limit of t (cf /Sj, n-1). Iho n.ctual df spicifiad by the Weleh 
solution varies froLi sanpln to ^-anyle. Thus the only distinction be-- 
tween t and BF in this s tuay !□ in '^hin fact that the BF solution may 
have a largor critical virtue t^^at cvtrcom&a the sllfht positive bias 
in the t statistic when the ho/nogencity of variance assumption Is 
violated (B0X| li953)» As t'lis nample else inareases/ the difference 
between t (o^^/2, 2n - 2) ani t (cj^/2, n - 1) decreases and the resultii 

for the t and J:? m^'bhode irould 1:03:^ rlmilar with both P(El) and FWI 

- ■ , ■ - r 

approaching their thsorctioai, vaj,ucs, Hcviwer; in a computtr solution 

it is appropriate to UQG tlvrbMt uathud for any unspiolfied n. The 

Bohrens--Fisher mithod Kith the Wcloh solution for critical values is 

best for ths small n oltL'ation* nM hskce 1^ rccoiamended, 
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